Human Genetics and Genomics Advances
○ Elsevier BV
All preprints, ranked by how well they match Human Genetics and Genomics Advances's content profile, based on 70 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Mugo, J. W.; Chimusa, E. R.; Mulder, N. J.
Show abstract
Whole-genome or genome-wide association studies have become a fundamental part of modern genetic studies and methods for dissecting the genetic architecture of common traits based on common polymorphisms in random populations. It is hoped that there will be many potential uses of these identified variants, including a better understanding of the pathogenesis of traits, the discovery of biomarkers and protein targets, and the clinical prediction of drug treatments for global health. Questions have been raised on whether associations that are largely discovered in populations of European descent are replicable in diverse populations, can inform medical decision-making globally, and how efficiently current GWAS tools perform in populations of high genetic diversity, multi-wave genetic admixture, and low linkage disequilibrium (LD), such as African populations. In this study, we employ genomic data simulation to mimic structured African, European, and multi-way admixed populations to evaluate the replicability of association signals from current state-of-the-art GWAS tools in these populations. We then leverage the results to discuss an optimized framework for the analysis of GWAS data in diverse populations and outline the implications, challenges, and opportunities these studies present for populations of non-European descent.
Carlson, J. C.; Krishnan, M.; Liu, S.; Anderson, K. J.; Zhang, J. Z.; Yapp, T.-A. J.; Chiyka, E. A.; Dikec, D. A.; Cheng, H.; Naseri, T.; Reupena, M. S.; Viali, S.; Deka, R.; Hawley, N. L.; McGarvey, S. T.; Weeks, D. E.; Minster, R. L.
Show abstract
Genotype imputation is fundamental to association studies, and yet even gold standard panels like TOPMed are limited in the populations for which they yield good imputation. Specifically, Pacific Islanders are poorly represented in extant panels. To address this, we constructed an imputation reference panel using 1,285 Samoan individuals with whole-genome sequencing, combined with 1000 Genomes Project (1KGP) individuals, to create a reference panel that better represents Pacific Islander, specifically Samoan, genetic variation. We compared this panel to 1KGP and TOPMed-R3 panels based on imputed variants using genotyping array data for 1,834 Samoan participants who were not part of the panels. The 1KGP + 1285 Samoan panel yielded up to two times more well-imputed (r2 [≥] 0.80) variants than TOPMed-R3 and 1KGP and was enriched for moderate and high impact variants. There was improved imputation accuracy across the minor allele frequency (MAF) spectrum, although it was most pronounced for variants with 0.01 [≤] MAF [≤] 0.05. Imputation accuracy (r2) was greater for population-specific variants (high fixation index, FST) and those from larger haplotypes (high LD score). However, the gain in imputation accuracy over TOPMed-R3 was largest for small haplotypes (low LD score), reflecting the Samoan panels ability to capture population-specific variation not well tagged by other panels. We also augmented the 1KGP reference panel with varying numbers of Samoan participants and found that panels with 24 Samoans yielded similar performance to TOPMed-R3, and panels with 48 or more Samoans included outperformed TOPMed-R3 for all variants with MAF [≥] 0.001. Meta imputation of the TOPMed-R3 and 1285 Samoan panels yielded poorer performance than the Samoan only panel. We also demonstrated that the phasing of the reference panel impacts the imputation of population-specific variants when the reference panel is composed of individuals from an isolated population and not combined with ancestrally diverse haplotypes. This study identifies variants with improved imputation using population-specific reference panels and provides a framework for constructing other population-specific reference panels.
Link, V.; Zavaleta, Y. J. A.; Reyes, R.-J.; Ding, L.; Wang, J.; Rohlfs, R. V.; Edge, M. D.
Show abstract
The 20 short tandem repeat (STR) markers of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS markers are thought to contain information relevant to identification only (such as a human fingerprint would), with little information about ancestry or traits. However, in the past 20 years, a quickly growing field has identified hundreds of thousands of genotype-trait associations. Here we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. We find that the regions around the CODIS markers are enriched for both known pathogenic variants (>90th percentile) and for SNPs identified as trait-associated in genome-wide association studies (GWAS) ([≥]95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs. Although it is not obvious how much phenotypic information CODIS would need to convey to strain the "DNA fingerprint" analogy, the CODIS markers, considered as a set, are in regions unusually dense with variants with known phenotypic associations.
Cataldo-Ramirez, C.; Lin, M.; McMahon, A.; Gignoux, C.; Weaver, T. D.; Henn, B. M.
Show abstract
Genome-wide association studies (GWAS) and polygenic score (PGS) development are typically constrained by the data available in biobank repositories in which European cohorts are vastly overrepresented. Here, we increase the utility of non-European participant data within the UK Biobank (UKB) by characterizing the genetic affinities of UKB participants who self-identify as Bangladeshi, Indian, Pakistani, "White and Asian" (WA), and "Any Other Asian" (AOA), towards creating a more robust South Asian sample size for future genetic analyses. We assess the relationships between genetic structure and self-selected ethnic identities and use consistent patterns of clustering in the dataset to train a support vector machine (SVM). The SVM was utilized to reassign n = 1,853 AOA and WA participants at the subcontinental level, and increase the sample size of the UKB South Asian group by 1,381 additional participants. We further leverage these samples to assess GWAS performance and PGS development. We include environmental covariates in the height GWAS by implementing a rigorous covariate selection procedure, and compare the outputs of two GWAS models: GWASnull and GWASenv. We show that PGS performance derived from both GWAS models yield comparable prediction to PGS models developed with an order of magnitude larger training, and environmentally-adjusted PGS models reduce the sex-bias in predictive performance. In summary, we demonstrate how GWAS performance can be improved by leveraging ambiguous ethnicity codes, ancestry matched imputation panels, and including environmental covariates.
Li, S.; Fatema, K.; Nidharshan, S.; Singh, A.; Rajagopal, P. S.; Notani, D.; Takeda, D.; Hannenhalli, S.
Show abstract
Incidence and severity of prostate cancer (PrCa) substantially varies across ancestries. American men of African ancestry (AA) are more likely to be diagnosed with and die from PrCa than the those of European ancestry (EA). Published polygenic risk scores for developing prostate cancer, even those based on multi-ancestry genome-wide association studies, do not address population-specific genetic mechanisms underlying PrCa risk in men of African ancestry. Specifically, the role of non-coding regulatory polymorphisms in driving inter-ancestry variation in PrCa has not been sufficiently explored. Here, by employing a sequence-based deep learning model of prostate regulatory enhancers, we identified [~]2,000 SNPs with higher alternate allele frequency in AA men that potentially affect enhancer function associated with PrCa susceptibility, as supported by our experimental validation. The identified enhancer SNPs (eSNPs) may influence PrCa development through two complementary mechanisms: 1) the alternate allele that increase enhancer activity result in immune suppression and telomere elongation, and 2) the alternate alleles that decrease enhancer activity, lead to de-differentiation and inhibition of apoptosis. Notably, the eSNPs tend to disrupt the binding of known prostate transcription factors including FOX, AR and HOX families. Lastly, the identified eSNPs can be combined into a polygenic risk score that adds value to current GWAS-based risk variants in assessing PrCa risk in independent cohorts.
Ding, X.; Singh, P.; Tran, T. N.; Fragoza, R.; Yu, H.; Schimenti, J. C.
Show abstract
Infertility is a heterogeneous condition, with genetic causes estimated to be involved in approximately half of the cases. High-throughput sequencing (HTS) is becoming an increasingly important tool for genetic diagnosis of diseases including idiopathic infertility, however, most rare or minor alleles revealed by HTS are variants of uncertain significance (VUS). Interpreting the functional impacts of VUS is challenging but profoundly important for clinical management and genetic counseling. To determine the consequences of population polymorphisms in key fertility genes, we functionally evaluated 11 missense variants in the genes ANKRD31, BRDT, DMC1, EXOI, FKBP6, MCM9, M1AP, MEI1, MSH4 and SEPT12 by generating genome-edited mouse models. Nine variants were classified as deleterious by most functional prediction algorithms, and two disrupted a protein-protein interaction in the yeast 2 hybrid assay. Even though these genes are known to be essential for normal meiosis or spermiogenesis in mice, only one of the tested human variants (rs1460351219, encoding p.R581H in MCM9), which was observed in a male infertility patient, compromised fertility or gametogenesis in the mouse models. To explore the disconnect between predictions and outcomes, we compared pathogenicity calls of missense variants made by ten widely-used algorithms to: 1) those present in ClinVar, and 2) those which have been evaluated in mice. We found that all the algorithms performed poorly in terms of predicting the effects of human missense variants that have been modeled in mice. These studies emphasize caution in the genetic diagnoses of infertile patients based primarily on pathogenicity prediction algorithms, and emphasize the need for alternative and efficient in vitro or vivo functional validation models for more effective and accurate VUS delineation to either pathogenic or benign categories. SignificanceAlthough infertility is a substantial medical problem that affects up to 15% of couples, the potential genetic causes of idiopathic infertility have been difficult to decipher. This problem is complicated by the large number of genes that can cause infertility when perturbed, coupled with the large number of VUS that are present in the genomes of affected patients. Here, we present and analyze mouse modeling data of missense variants that are classified as deleterious by commonly-used pathogenicity prediction algorithms but which caused no detectible phenotype when introduced into mice by genome editing. We find that augmenting pathogenicity predictions with preliminary screens for biochemical defects substantially enhanced the proportion of prioritized variants that caused phenotypes in mice. The results emphasize that, in the absence of substantial improvements of in silico prediction tools or other compelling pre-existing evidence, in vivo analysis is crucial for confident attribution of infertility alleles.
Wang, X.; Sofer, T.; Frei, O.; Kaplan, R.; Perreira, K. M.; Franceschini, N.; Parada, H.; Zhou, L.; Andreassen, O. A.; Gonzalez, H.; Dale, A. M.; Broce, I. J.
Show abstract
Polygenic scores (PGS) offer moderate to high prediction accuracy for complex traits, but most are developed in European ancestry cohorts, reducing their performance in populations of other ancestries. This study aimed to improve standing height prediction, a heritable and ancestry-influenced trait, in an admixed Latino cohort (HCHS/SOL) by modeling ancestry using principal components (PCs) alongside PGS. SNPs were selected from a large European ancestry GWAS using various p-value thresholds, and weights were trained using traditional and penalized regression in the UK Biobank (UKB). PGS with PCs were trained separately in HCHS/SOL and UKB. Compared to PGS alone, modeling PGS with PCs substantially improved height prediction in HCHS/SOL (R{superscript 2} increase of [~]0.1), while mild improvements were observed in UKB (R{superscript 2} increase of [~]0.01). These results underscore the importance of incorporating genetic ancestry into predictive models for admixed populations, particularly when the trait exhibits ancestry-specific associations.
Topaloglu, A. K.; Plummer, L.; Su, C.-W.; Kotan, L. D.; Celmeli, G.; Simsek, E.; Zhao, Y.; Stamou, M.; Anik, A.; Döger, E.; Altıncık, S. A.; Mengen, E.; Koc, A. F.; Akkus, G.; Balasubramanian, R.; Turan, I.; Seminara, S. B.; Yuksel, B.
Show abstract
PurposeIdiopathic hypogonadotropic hypogonadism (IHH) is characterized by impaired reproductive maturation, and approximately half of all cases lack an identified genetic cause. We investigated the genetic basis of IHH in two large cohorts to identify novel disease-causing genes. MethodsWe analyzed exome and genome sequencing data from 1,822 patients with IHH from two independent cohorts. Rare variants were filtered using pedigree-informed inheritance models. PLEKHA6 expression in the postmortem human hypothalamus were tested at the mRNA and protein level. Functional studies assessed kisspeptin secretion in cell-based assays. ResultsWe identified 18 distinct PLEKHA6 variants in 24 patients from 20 unrelated families (1.3% of cohort). Variants segregated with disease under autosomal recessive and autosomal dominant (with variable penetrance) inheritance patterns. PLEKHA6 was robustly expressed in the hypothalamus and showed clear colocalization with neurokinin B, which served as the marker for the GnRH pulse generator. Functional studies demonstrated that patient variants significantly impaired kisspeptin secretion. ConclusionPLEKHA6 is a novel IHH gene and the first reported regulator of kisspeptin secretion from the kisspeptin-neurokinin B-dynorphin (KNDy) neurons, which have recently been established as the GnRH pulse generator. These findings establish impaired kisspeptin release as a new disease mechanism in IHH and highlight the critical role of neuropeptide trafficking in reproductive function.
Dey, K. K.; Kim, S. S.; Gazal, S.; Nasser, J.; Engreitz, J. M.; Price, A.
Show abstract
Deep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived from these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect annotations (predicted difference between reference and variant alleles) constructed using several previously trained deep learning models: DeepSEA, Basenji and DeepBind (and a related machine learning model, deltaSVM). First, we employ gradient boosting to learn optimal combinations of deep learning annotations, using fine-mapped SNPs and matched control SNPs (on held-out chromosomes) for training. Second, we improve the specificity of these annotations by restricting them to SNPs implicated by (proximal and distal) SNP-to-gene (S2G) linking strategies, e.g. prioritizing SNPs involved in gene regulation. Third, we predict gene expression (and derive allelic-effect annotations) from deep learning annotations at SNPs implicated by S2G linking strategies -- generalizing the previously proposed ExPecto approach, which incorporates deep learning annotations based on distance to TSS. We evaluated these approaches using stratified LD score regression, using functional data in blood and focusing on 11 autoimmune diseases and blood-related traits (average N =306K). We determined that the three approaches produced SNP annotations that were uniquely informative for these diseases/traits, despite the fact that linear combinations of the underlying DeepSEA, Basenji, DeepBind and deltaSVM blood annotations were not uniquely informative for these diseases/traits. Our results highlight the benefits of integrating SNP annotations produced by deep learning models with other types of data, including data linking SNPs to genes.
Gu, W.; Gilbertson, E.; Baranzini, S. E.; Salem, R.; Capra, J. A.
Show abstract
Genome-wide association studies (GWAS) have identified thousands of variants associated with complex traits, yet the majority lie in noncoding regions, making it difficult to determine their functional impact. Alterations to the three-dimensional (3D) spatial interactions among gene regulatory elements are increasingly recognized as a mechanism by which genetic variants influence gene expression. However, experimentally evaluating whether variants disrupt 3D-genome structure is not feasible at GWAS scale. To address this, we developed a computational framework that integrates GWAS summary statistics with predictions from the Akita sequence-based deep learning model of 3D chromatin contacts. We applied the framework to 9,917 genomic regions associated with human height, assessing both individual variants and haplotypes for their predicted impact on 3D genome architecture. Only a small fraction of height-associated haplotypes had substantial predicted disruption of 3D folding (17 regions, 0.17%, exceeded a disruption score of 0.1). Considering all common variants in a haplotype together generally produced greater perturbations than individual variants, but several highly divergent regions were driven by single variants. We highlight a variant that disrupts the binding motif at a confirmed CTCF binding site and is predicted to modify 3D genome contacts with the LCOR promoter, suggesting that 3D-genome-mediated disruption of gene regulation underlies the association with height. This work presents a scalable and interpretable strategy for integrating 3D genome modeling with GWAS, enabling investigation of this important regulatory mechanism in the connection of non-coding genetic variation to complex traits.
Lo, Y.-C.; Chan, T. F.; Jeon, S.; Maskarinec, G.; Taparra, K.; Nakatsuka, N.; Yu, M.; Chen, C.-Y.; Lin, Y.-F.; Wilkens, L. R.; Le Marchand, L.; Haiman, C. A.; Chiang, C. W. K.
Show abstract
Polygenic scores (PGS) are promising in stratifying individuals based on the genetic susceptibility to complex diseases or traits. However, the accuracy of PGS models, typically trained in European- or East Asian-ancestry populations, tend to perform poorly in other ethnic minority populations, and their accuracies have not been evaluated for Native Hawaiians. Using body mass index, height, and type-2 diabetes as examples of highly polygenic traits, we evaluated the prediction accuracies of PGS models in a large Native Hawaiian sample from the Multiethnic Cohort with up to 5,300 individuals. We evaluated both publicly available PGS models or genome-wide PGS models trained in this study using the largest available GWAS. We found evidence of lowered prediction accuracies for the PGS models in some cases, particularly for height. We also found that using the Native Hawaiian samples as an optimization cohort during training did not consistently improve PGS performance. Moreover, even the best performing PGS models among Native Hawaiians would have lowered prediction accuracy among the subset of individuals most enriched with Polynesian ancestry. Our findings indicate that factors such as admixture histories, sample size and diversity in GWAS can influence PGS performance for complex traits among Native Hawaiian samples. This study provides an initial survey of PGS performance among Native Hawaiians and exposes the current gaps and challenges associated with improving polygenic prediction models for underrepresented minority populations.
Hui, D.; Xiao, B.; Dikilitas, O.; Freimuth, R. R.; Irvin, M. R.; Jarvik, G. P.; Kottyan, L.; Kullo, I.; Limdi, N. A.; Liu, C.; Luo, Y.; Namjou, B.; Puckelwartz, M. J.; Schaid, D.; Tiwari, H.; Wei, W.-Q.; Verma, S. S.; Kim, D.; Ritchie, M. D.
Show abstract
Polygenic risk scores (PRS) have led to enthusiasm for precision medicine. However, it is well documented that PRS do not generalize across groups differing in ancestry or sample characteristics e.g., age. Quantifying performance of PRS across different groups of study participants, using genome-wide association study (GWAS) summary statistics from multiple ancestry groups and sample sizes, and using different linkage disequilibrium (LD) reference panels may clarify factors limiting PRS transferability. To evaluate these factors in the PRS generation process, we generated body mass index (BMI) PRS (PRSBMI) in the Electronic Medical Records and Genomics network (N=75,661). Analyses were conducted in two ancestry groups (European and African) and three age ranges (adult, teenagers, and children). For PRSBMI calculations, we evaluated five LD reference panels and three GWAS summary statistics of varying sample size and ancestry. PRSBMI performance increased for both African and European ancestry individuals using cross-ancestry GWAS summary statistics compared to European-only summary statistics (6.3% and 3.7% relative R2 increase, respectively, pAfrican=0.038, pEuropean=6.26x10-4). The effects of LD reference panels were more pronounced in African ancestry study datasets. PRSBMI performance degraded in children; R2 was less than half of teenagers or adults. The effect of GWAS summary statistics sample size was small when modeled with the other factors. We also explored clinical comorbidities associated with the PRSBMI and identified associations with type 2 diabetes and coronary atherosclerosis. This study quantifies effects that ancestry, GWAS summary statistic sample size, and LD reference panel have on PRS performance, especially in cross-ancestry and age-specific analyses.
Lin, M.; Caberto, C.; Wan, P.; Li, Y.; Lum-Jones, A.; Tiirikainen, M.; Pooler, L.; Nakamura, B.; Sheng, X.; Porcel, J.; Lim, U.; Setiawa, V. W.; Le Marchand, L.; Wilkens, L. R.; Haiman, C. A.; Cheng, I.; Chiang, C. W. K.
Show abstract
Statistical imputation applied to genome-wide array data is the most cost-effective approach to complete the catalog of genetic variation in a study population. However, imputed genotypes in underrepresented populations incur greater inaccuracies due to ascertainment bias and a lack of representation among reference individuals,, further contributing to the obstacles to study these populations. Here we examined the consequences due to the lack of representation by genotyping a functionally important, Polynesian-specific variant, rs373863828, in the CREBRF gene, in a large number of self-reported Native Hawaiians (N=3,693) from the Multiethnic Cohort. We found the derived allele of rs373863828 was significantly associated with several adiposity traits with large effects (e.g. 0.214 s.d., or approximately 1.28 kg/m2, per allele, in BMI as the most significant; P = 7.5x10-5). Due to the current absence of Polynesian representation in publicly accessible reference sequences, rs373863828 or any of its proxies could not be tested through imputation using these existing resources. Moreover, the association signals at this Polynesian-specific variant could not be captured by alternative approaches, such as admixture mapping. In contrast, highly accurate imputation can be achieved even if a small number (<200) of Polynesian reference individuals were available. By constructing an internal set of Polynesian reference individuals, we were able to increase sample size for analysis up to 3,936 individuals, and improved the statistical evidence of association (e.g. p = 1.5x10-7, 3x10-6, and 1.4x10-4 for BMI, hip circumference, and T2D, respectively). Taken together, our results suggest the alarming possibility that lack of representation in reference panels would inhibit discovery of functionally important, population-specific loci such as CREBRF. Yet, they could be easily detected and prioritized with improved representation of diverse populations in sequencing studies.
Cullina, S.; Shemirani, R.; Asgari, S.; Kenny, E. E.
Show abstract
Biobank-scale association studies that include Hispanic/Latino(a) (HL) and African American (AA) populations remain underrepresented, limiting the discovery of disease associated genetic factors in these groups. We present here a systematic comparison of phenome-wide admixture mapping (AM) and genome-wide association (GWAS) using data from the diverse BioMe biobank in New York City. Our analysis highlights 77 genome-wide significant AM signals, 48 of which were not detected by GWAS, emphasizing the complementary nature of these two approaches. AM-tagged variants show significantly higher minor allele frequency and population differentiation (Fst) while GWAS demonstrated higher odds ratios, underscoring the distinct genetic architecture identified by each method. This study offers a comprehensive phenome-wide AM resource, demonstrating its utility in uncovering novel genetic associations in underrepresented populations, particularly for variants missed by traditional GWAS approaches.
Lin, Y.-S.; Tan, T.; Wang, Y.; Pasaniuc, B.; Martin, A.; Atkinson, E. G.
Show abstract
Polygenic scores (PGS) are widely used to estimate genetic predisposition to complex traits by aggregating the effects of common variants into a single measure. They hold promise in identifying individuals at increased risk for diseases, allowing earlier screening and interventions. Genotyping arrays, commonly used for PGS computation, are affordable and computationally efficient, while whole-genome sequencing (WGS) offers a more comprehensive view of genetic variation. In this study, we compared PGS derived from arrays and WGS across multiple traits to evaluate differences in predictive performance, portability across populations, and computational efficiency. We computed PGS for 10 traits, representing a range of heritability and polygenicity, in the three largest genetic ancestry groups in All of Us (European, African American, Admixed American), trained on multi-ancestry meta-analyses from the Pan-UK Biobank. Using the clumping and thresholding (C+T) method, we found that WGS-based PGS outperformed array-based PRS for highly polygenic traits but showed differentially reduced accuracy for sparse traits in certain populations. With the LD-informed PRS-CS method, we observed overall improved prediction performance compared to C+T, with WGS outperforming arrays across most non-cancer traits. The results obtained using PRS-CS closely align with those derived from pre-trained models in the PGS Catalog, with prediction achieving better performance using WGS than array genotypes for non-sparse traits. To further investigate factors influencing differential prediction performance between array and WGS, we ran simulations varying the proportions of causal SNPs directly captured by the technologies. These demonstrated that the proportion of causal variants genotyped dramatically affects prediction accuracy. Fine-mapping of empirical data supported this concept but also highlighted the importance of reducing non-informative variants for optimal prediction accuracy. In conclusion, while WGS-based PGS generally offer superior predictive power with PRS-CS, the advantage over arrays is context-dependent, varying by trait, population, and the PGS method. The ability to capture causal variants through these technologies largely drives the prediction accuracy. This study provides insights into the complexities and potential advantages of using different genotype discovery approaches for polygenic predictions across populations and informs on strategies to enhance accuracy.
Martinez de Lapiscina, I.; Kouri, C.; Aurrekoetxea, J.; Sanchez, M.; Naamneh Elzenaty, R.; Sauter, K. S.; Camats, N.; Grau, G.; Rica, I.; Rodriguez, A.; Vela, A.; Cortazar, A.; Alonso-Cerezo, M. C.; Bahillo, P.; Berthod, L.; Esteva, I.; Castano, L.; Flueck, C. E.
Show abstract
Steroidogenic factor 1 (SF-1, NR5A1) plays an important role in human sex development. Variants of NR5A1/SF-1 may cause mild to severe differences of sex development (DSD) or may be found in healthy carriers. So far, the broad DSD phenotypic variability associated NR5A1/SF-1 variants remains a conundrum. The NR5A1/SF-1 variant c.437G>C/p.Gly146Ala is common in individuals with a DSD and has been suggested to act as a susceptibility factor for adrenal disease or cryptorchidism. However, as the allele frequency in the general population is high, and as functional testing of the p.Gly146Ala variant in vitro revealed inconclusive results, the disease-causing effect of this variant has been questioned. However, a role as a disease modifier in concert with other gene variants is still possible given that oligogenic inheritance has been described in patients with NR5A1/SF-1 gene variants. Therefore, we performed next generation sequencing in DSD individuals harboring the NR5A1/SF-1 p.Gly146Ala variant to search for other DSD-causing variants. Aim was to clarify the function of this variant for the phenotype of the carriers. We studied 14 pediatric DSD individuals who carried the p.Gly146Ala variant. Panel and whole-exome sequencing was performed, and data were analyzed with a specific data filtering algorithm for detecting variants in NR5A1- and DSD-related genes. The phenotype of the studied individuals ranged from scrotal hypospadias and ambiguous genitalia in 46,XY DSD to typical male external genitalia and ovotestes in 46,XX DSD patients. Patients were of African, Spanish, and Asian origin. Of the 14 studied subjects, five were homozygous and nine heterozygous for the NR5A1/SF-1 p.Gly146Ala variant. In ten subjects we identified either a clearly pathogenic DSD gene variant (e.g. in AR, LHCGR) or one to four potentially deleterious variants that likely explain the observed phenotype alone (e.g. in FGFR3, CHD7, ADAMTS16). Our study shows that most individuals carrying the NR5A1/SF-1 p.Gly146Ala variant, harbor at least one other deleterious gene variant which can explain the DSD phenotype. This finding confirms that the p.Gly146Ala variant of NR5A1/SF-1 may not contribute to the pathogenesis of DSD and qualifies as a benign polymorphism. Thus, individuals, in whom the NR5A1/SF-1 p.Gly146Ala gene variant has been identified as the underlying genetic cause for their DSD in the past, should be re-evaluated with a next-generation sequencing method to reveal the real genetic diagnosis.
Evans, L. M.; Arehart, C. H.; Gibson, R. A.; Bowman, G. I.; Gignoux, C.
Show abstract
Many datasets, including widely used biobanks, have more than one observation of numerous phenotypes for at least a portion of their sample. The majority of GWAS utilize only a single observation per individual, even when more than one observation may be available, and apply a standard model in which the additive allelic effect being estimated is assumed to be constant across the age or time range in the sample. Here, we test a set of simple approaches to utilize multiple observations per individual, under this same assumption. We find that utilizing the mean or median of the available observations rather than a single observation improves power to detect associated loci and enriched gene sets and yields higher out-of-sample polygenic score prediction accuracy. Despite growing biobanks, many deeply phenotyped samples are relatively small but have multiple observations. While explicitly modeling age- or time-dependent genetic effects can estimate time- or age-specific genetic effects, most GWAS apply a standard, additive-only model; a simple approach of using the mean or median can improve power by reducing "noise" in the phenotype, utilize standard, optimized software, and be particularly impactful for smaller samples, including samples of diverse genetic ancestry currently existing in widely used biobanks.
Kozlowska, J.; Humphryes-Kirilov, N.; Pavlovets, A.; Connolly, M.; Kuncheva, Z.; Horner, J.; Sousa Manso, A.; Murray, C.; Fox, J. C.; McCarthy, A.
Show abstract
Genetic support for a drug target has been shown to increase the probability of success in drug development, with the potential to reduce attrition in the pharmaceutical industry alongside discovering novel therapeutic targets. It is therefore important to maximise the detection of genetic associations that affect disease susceptibility. Conventional statistical methods used to analyse genome-wide association studies (GWAS) only identify some of the genetic contribution to disease, so novel analytical approaches are required to extract additional insights. C4X Discovery has developed a new method Taxonomy3(R) for analysing genetic datasets based on novel mathematics. When applied to a previously published rheumatoid arthritis GWAS dataset, Taxonomy3(R) identified many additional novel genetic signals associated with this autoimmune disease. Follow-up studies using tool compounds support the utility of the method in identifying novel biology and tractable drug targets with genetic support for further investigation.
Jiang, K.; Haley, E. K.; Barshad, G.; He, A.; Rogic, A.; Rice, E. J.; Sudman, M.; Thompson, S. D.; Danko, C. G.; Jarvis, J. N.
Show abstract
GWAS have identified multiple genetic regions that confer risk for juvenile idiopathic arthritis (JIA). However, identifying the single nucleotide polymorphisms (SNPs) that drive disease risk has been impeded by the fact that the SNPs used to identify risk loci are in linkage disequilibrium (LD) with hundreds of other SNPs. Since the causal SNPs remain unknown, it is difficult to identify target genes and thus use genetic information to elucidate disease biology and inform patient care. We next used existing genotyping data from 3,939 children with JIA and 14,412 healthy controls to identify SNPs on JIA risk haplotypes that: present within open chromatin in multiple immune cell types and more common in children with JIA than the controls (p<0.05) in the genotyping data sets. We identified SNPs within cis-regulatory regions (CREs) using precision run-on sequencing data, and identified likely target genes using MicroC in both resting and activated CD4+ T cells. We identified 138 SNPs within the PROseq-identified CREs, and n=41 genes with which these CREs physically interacted. Data from GTEx corroborated these analyses by showing allelic effects for SNPs within the CREs in the ERAP2 and IRF1 risk loci. We further corroborated IRF1 allelic effects using a luciferase reporter assay. Our findings significantly reduce the genomic search space for risk-driving variants and target genes and support the roles of IRF1, ERAP2 and LNPEP in driving risk for JIA.
Jung, Y. L.; Hung, C.; Choi, J.; Lee, E. A.; Bodamer, O.
Show abstract
Kabuki Syndrome (KS) is a rare multisystem disorder with a variable clinical phenotype. The majority of KS cases are caused by dominant loss-of-function mutations in two genes, KMT2D (lysine methyltransferase 2D, KS1) and KDM6A (lysine demethylase 6A, KS2). Both KMT2D and KDM6A play a critical role in chromatin accessibility, which is essential for developmental processes and differentiation. In a previous study, we reported that KMT2D mutations could lead to increased enhancer activity in genes related to metabolomic pathways in KS1. Early detection of KS is crucial in order to offer improved treatment options. To uncover new biomarkers that could facilitate early detection and to inform clinical trial readiness, we conducted a study in which we collected and analyzed plasma and urine metabolites from 40 KS patients with pathogenic mutations in either KMT2D or KDM6A and 12 healthy controls. We employed an untargeted approach using Liquid Chromatography with tandem Mass Spectrometry (LC-MS/MS). Additionally, we profiled gene expression in the most KS patients and controls. Our analysis revealed > 100 significantly altered metabolites between KS patients and controls, with these metabolites being clustered based on genotypes. Importantly, we identified N2, N2-dimethylguanosine emerging as one of the top candidates in both KS1 and KS2 patients. We utilized machine learning classifiers and identified the most crucial metabolites. Using this trained model, we achieved a high level of discrimination between the KS data and controls. Furthermore, pathway analysis revealed several disrupted pathways, including the pyrimidine metabolism pathway, which are associated with the significantly altered in both metabolome and transcriptome in KS. Distinctive metabolites identified in KS can effectively serve as discriminative biomarkers. Our findings provide valuable insights into the metabolic dysregulation underlying KS and highlight potential targets for further investigation and therapeutic interventions.